Comparative analysis on video retrieval technique using machine learning

S. Sasireka

Department of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India 638401

*Corresponding Author E-mail: sasireka@bitsathy.ac.in

Abstract:

The objective of information mining is to find and portray fascinating examples in information. This errand is particularly testing when the information comprise of video arrangements (which may likewise have sound substance), due to the need to break down colossal volumes of multidimensional information. The lavishness of the space suggests that a wide range of methodologies can be taken and a wide range of instruments and systems can be utilized, as can be found in the sections of this book. They manage grouping and arrangement, signals and characters, division and rundown, insights and semantics. No endeavor will be made here to drive these subjects into a straightforward structure. In the creators' own (sometimes shortened) words, the sections manage video perusing utilizing various synchronized perspectives; the physical setting as a video mining crude; transient video limits; video rundown utilizing action and sound descriptors; content examination utilizing multimodal data; video OCR; video arrangement utilizing semantics and semiotics; the semantics of media; measurable procedures for video investigation and seeking; mining of factual worldly structures in video; and pseudo-pertinence criticism for sight and sound recovery.

KEY WORDS: Bag of features (Bof), histograms, Support Vector Machines, keypoint locations

INTRODUCTION:

Interesting patterns are described and discovered using data mining .This task is especially challenging when the data consist of video sequences (which may also have audio content), because of the need to analyze enormous volumes of multidimensional data. The richness of the domain implies that many different approaches can be taken and many different tools and techniques can be used, as can be seen in the chapters of this book. They deal with clustering and categorization, cues and characters, segmentation and summarization, statistics and semantics. In the authors’ own (occasionally abridged) words, the chapters deal with video browsing using multiple synchronized views; the physical setting as a video mining primitive; temporal video boundaries; video summarization using activity and audio descriptors; content analysis using multimodal information; video OCR; video categorization using semantics and semiotics; the semantics of media; statistical techniques for video analysis and searching; mining of statistical temporal structures in video; and pseudo-relevancy feedback for multimedia retrieval.

Digital video is a rich medium compared to text material. It is usually accompanied by other information sources such as speech, music and closed captions. Therefore, it is important to fuse this heterogeneous information intelligently to fulfill the users’ search queries. Conventionally, the data is often indexed and retrieved by directly matching homogeneous types of data. Multimedia data, however, also contains important information related to the interaction between heterogeneous types of data, such as video and sound, a fact confirmed through human experience. We often observe that a scene may not evoke the same response of horror or sympathy, if the accompanying sound is muted. Conventional methods fail to utilize these relationships since heterogeneous data types cannot be compared directly. The challenge is to develop sophisticated techniques that fully utilize the rich source of information contained in multimedia data.

ASSIGNMENTS OF DATA MINING

The objective of any information mining exertion can be separated in one of the accompanying two composes (Cha and Lewis, 2002:57): [22].

1) Using information mining to produce graphic models to take care of issues.

2) Using information mining to produce prescient models to take care of issues.

Illustrative information mining center around discovering designs portraying the information that can be translated by people, and creates new, nontrivial data in light of the accessible informational collection. Prescient information mining includes utilizing a few factors or fields in the informational collection to foresee obscure or future estimations of other variablesof intrigue, and creates the model of the framework portrayed by the given informational collection. The objective of prescient information mining is to create a model that can be utilized to perform undertakings, for example, order, forecast or estimation, while the objective of clear information mining is to pick up a comprehension of the dissected framework by revealing examples and connections in expansive information sets.The objective of a graphic information mining model is in this manner to find designs in the information and to comprehend the connections between characteristics spoken to by the information, while the objective of a prescient information mining model is to anticipate the future results in light of passed records with known answers. Additionally isolate the information mining undertaking of producing models into the accompanying two methodologies:

1) Supervised or coordinated information mining displaying.

2) Unsupervised or undirected information mining displaying.

The undertaking is to clarify the estimations of some specific field. The client chooses the objective field and guides the PC to decide how to evaluate, arrange or anticipate its esteem. In unsupervised or undirected information mining anyway factor is marked out as the objective. The objectives of prescient and elucidating information mining are accomplished by utilizing particular information mining systems that fall inside certain essential information mining assignments. The objective is somewhat to set up some relationship among every one of the factors in the information. The client requests that the PC recognize designs in the information that might be critical. Undirected displaying is utilized to clarify those patters and connections one they have been found.

Grouping/Prediction

Grouping includes the disclosure of a prescient learning capacity that orders an information thing into one of a few predefines classes. It includes looking at the highlights of a recently displayed protest and allocating to it a predefined class. Characterize grouping has a two stage process. Expectation can be seen as the development and utilization of a model to evaluate the class of an unlabeled example, or to survey the esteem or esteem scope of a trait that a given example is probably going to have. As indicated by, any of the systems utilized for order can be adjusted for use in forecast by utilizing preparing illustrations where the estimation of the variable to be anticipated is as of now known, alongside verifiable information for those cases.

Estimation

While order manages discrete results, for example, yes or no, platinum card, home credit or vehicle financing, estimation manages constantly esteemed results. In the event that some info information is accessible, estimation taxi be utilized to think of some obscure consistent variable, for example, salary or tallness. In estimation, one needs to think of a conceivable esteem or a scope of conceivable qualities for the obscure parameters of a framework.

Division

Division basically implies making diverse offers to various markets sections; gatherings of individuals characterized by some mix of statistic factors, for example, age, sex or salary. Characterize the division as a type of investigation used to for example separate the guests to a site into special gatherings with singular practices. The gathering would then be able to be utilized to make measurable projections, for example, the potential measure of buys they are probably going to make. Regular business addresses that can be addressed utilizing division are: What are the diverse sorts of guests pulled in to our site? In which age bunches do the audience members of a specific radio station fall into? Division is gathered under unmistakable information mining errands.

Grouping

Bunching is the errand of dividing an assorted gathering into various comparable subgroups or groups. Groups of items are shaped so protests inside a bunch have high similitude in contrast with each other, however are exceptionally not at all like questions in different groups. Bunching is usually used to scan for extraordinary groupings inside an informational collection. The distinctive factor amongst bunching and characterization is that in grouping there are no predefined classes and no illustrations. The items are gathered together in view of self closeness.

Depiction and Visualization

The reason for information mining is some of the time just to depict what is happening in a confounded database in a way that expanded our comprehension of the general population, items or procedures that delivered the information in any case. They express that a sufficient portrayal of conduct will regularly recommend a clarification for it also. A standout amongst the most great types of elucidating information mining is information perception. In spite of the fact that representation isn't in every case simple, the correct picture can really talk a thousand words since people are to a great degree rehearsed at removing importance from visual scenes. Perception can be valuable in giving a visual portrayal of the area and dissemination of an organization's real customers on a guide of a city or a territory or even a nation.

COMPUTER VISION

PC vision (or machine vision) is the science and innovation of machines that see. Here observe implies the machine can remove data from a picture, to illuminate some undertaking, or maybe "comprehend" the scene in either an expansive or restricted sense.

Applications go from (generally) basic errands, for example, mechanical machine vision frameworks which, say, check bottles speeding by on a creation line, to examination into computerized reasoning and PCs or robots that can fathom their general surroundings. As a logical train, PC vision is worried about the hypothesis behind fake frameworks that concentrate data from pictures. The data of the picture are made accessible from video courses of action, sees from different cameras, or multi-dimensional data from a therapeutic scanner.

Today in the 2014s, machine vision is a quickly creating field, in both the logical and mechanical fields, and in fact currently even in the computer game universe, with the principal machine vision PC diversion frameworks showing up as overall business items. Cases of utilizations of PC vision incorporate frameworks for:

Controlling forms (e.g., a modern robot or a self-governing vehicle).

Detecting occasions (e.g., for visual reconnaissance or individuals checking).

Organizing data (e.g., for ordering databases of pictures and picture arrangements).

Modeling articles or situations (e.g., mechanical investigation, restorative picture examination or geographical displaying).

Interaction (e.g., as the contribution to a gadget for PC human connection).

A few strands of PC vision investigate are firmly identified with the investigation of organic vision - in fact, similarly the same number of strands of AI explore are firmly tied with examination into human cognizance. The field of characteristic vision studies and models the physiological methods behind visual acknowledgment in individuals and distinctive animals.PC vision, then again, contemplates and depicts the procedures executed in programming and equipment behind fake vision frameworks. Interdisciplinary trade amongst natural and PC vision has demonstrated productive for the two fields.

PC vision is, in some ways, the backwards of PC designs. While PC designs produces picture information from 3D models, PC vision frequently delivers 3D models from picture information. There is likewise a pattern towards a blend of the two controls, e.g., as investigated in expanded reality.

Sub-spaces of PC vision incorporate scene remaking, occasion location, video following, question acknowledgment, getting the hang of, ordering, movement estimation, and picture rebuilding.

In our proposed method, classification using SVM is compared with existing K-means method and the accuracy for video retrieval is greater in SVM than K-means. Vivek Jain et al [1] compared cascaded SVM with conventional SVM for classifying large-scale patterns related problems in content based image retrieval and suggested that cascaded SVM works better than conventional. Dinakaran.D et al [11] implemented an effective and most efficient image retrieval system by quering and retrieving text including both image and text. Textual and visual descriptors are changed into vector format for storing in the database. Images with different weights are retrieved independently which should be combined in a meaningful order to retrieve the user wished combination of image list. Chang et al [6] proposed that the retrieval of images using colors, standard deviation and mean and showed the results by testing three databases containing images. Kekre et al [22] suggested that the usage of colors for retrieving images is the most commonly used method

TECHNIQUES OF DATA MINING FOR MEDICAL SCIENCES

Association: Association is one of the best known data mining technique. In association, a pattern is discovered based on a relationship of a particular item on other items in the same transaction.

Classification: Classification is used to classify each item in a set of data into one of predefined set of classes or groups.

Clustering: Clustering technique also defines the classes and put objects in them, while in classification techniques, objects are assigned into predefined classes.

Prediction: For instance, the prediction analysis technique can be used in sale to predict profit for the future if we consider sale is an independent variable, profit could be a dependent variable.

Sequential Patterns: businesses can use this information to recommend customers buy it with better deals based on their purchasing frequency in the past.

Decision tree: Decision tree is one of the most used data mining techniques because its model is easy to understand for users.

OBJECTIVE OF THE STUDY:

A survey for Bag of words (Bow) or Bag of features (Bof) model in video retrieval system. Most of the years, video retrieval is mainly used for browsing and searching for many applications. In recent years large amount of video retrieval shows large potential in both the research problems and an industry application.

SIFT descriptors demonstrate a great discriminative power in solving vision problems like extracting the information about the videos automatically. For different data access modalities such as browsing, searching, comparison and categorization, object recognition and video classification and more state-of-the-art large scale video retrieval system and trying to rely on them. First video quantizing local descriptors into visual words and then applying scalable indexing and retrieval process is used. Each and every video are splited into short frames by frames.

Histograms are calculated based on the visual words dictionary of an each frames and an input query are given and the particular video frames are selected from the database. Histogram is also used for the number of occurrence of an image. Keypoint locations are used to ensure an invariance of image location, scale and rotation. The process is performed on the image closest in scale to the keypoints scale.

Algorithm 1

Support Vector Machine is to compare the positive and negative occurrence of an image. SVM is used to retrieve the particular video from the database and the output of the process. Using the process the videos can be retrieved as soon as possible.

Machine leaning manages the leaning models by using, the Support vector machines that examine the information and designs are perceived with appropriate calculations which utilize the method of grouping the examination. The test cases are prepared by using the two classification apart from assets. Non-probabilistic paired direct classifier is formed by SVM model constructed assigns new cases to one class by the prepared calculations. Different classifications are separated by unmistakable hole under the certain circumstances. Same space is allocated for new illustrations. non-direct characterization can be played with SVM by utilizing the play trap.These mapping contributes an element spaces with high dimention.

Algorithm 2

Scale-space extrema detection

Demand over different scales and picture regions. See districts and scales that can be more than once doled out under various perspectives of a near scene or question. The scale space of a photograph is a point of confinement L(x, y) that is produced using the convolution of a Gaussian piece with the information picture. Scale the phase where the intrigue focuses, which are called keypoints in the SIFT system, are recognized. The photograph is convolved with Gaussian channels at various scales, and a brief timeframe later the refinements of dynamic Gaussian-obscured pictures are taken.Keypoints are then taken as maxima/minima of the Difference of Gaussians (DoG) that happen at different sizes of time.Specifically, a DoG image is given by

Where is the convolution of the original image with the Gaussian blur at scale, i.e.,

Consequently a DoG picture amongst scales and is only the distinction of the Gaussian-obscured pictures at scales and . For scale space extrema revelation in the SIFT count, the photo is first convolved with Gaussian-clouds at different scales. The convolved pictures are accumulated by octave (an octave identifies with increasing the estimation of ), and the estimation of is picked so we get a settled number of convolved pictures per octave. By then the Difference-of-Gaussian pictures are taken from connecting Gaussian-darkened pictures per octave.

At the point when DoG pictures have been gotten, keypoints are recognized as adjacent minima/maxima of the DoG pictures transversely finished scales. The strategy is done by taking a gander at each pixel in the DoG pictures to its eight neighbors at a comparative scale and nine contrasting neighboring pixels in each one of the neighboring scales. If the pixel regard is the most extraordinary or slightest among every single examined pixel, it is picked as a contender key point.

Key point revelation step is an assortment of one of the blob acknowledgment procedures made by Lindeberg by recognizing scale-space extrema of the scale institutionalized that is perceiving centers that are adjacent extrema concerning both space and scale, in the discrete case by examinations with the nearest 26 neighbors in a discretized scale-space volume.

SIFT DESCRIPTOR

Keypoints of a protest are first extricated from an arrangement of reference pictures and put away in a database. A question is perceived in another picture by separately looking at each element from the new picture to the database and discovering applicant coordinating component in light of their element vector.

Any protest in a picture, intriguing focuses on the question can be separated to give an "element portrayal" of the question. Highlight portrayal is extricated from a preparation picture, would then be able to be utilized to recognize the question when endeavoring to find the protest in a test picture containing numerous different items. Highlights extricated from the preparation picture to perceivable even under changes in picture scale, clamor and brightening. Such focuses more often than not lie on high-differentiate locales of the picture, for example, protest edges. Each bunch of at least 3 includes that concede to a protest and its posture is then subject to additionally point by point show confirmation and in this manner exceptions are disposed of. Question coordinates that breeze through every one of these tests can be distinguished as right with high certainty.

Picture highlight age changes a picture into an expansive accumulation of highlight vectors, every one of which is invariant to picture interpretation, scaling, and pivot, mostly invariant to enlightenment changes and strong to neighborhood geometric contortion.

Orientation task

Each keypoint is alloted at least one introductions in view of nearby picture slope bearings. The key advance in accomplishing invariance to turn as the keypoint descriptor can be spoken to with respect to this introduction and along these lines accomplish invariance to picture revolution.

To start with, the Gaussian-smoothed picture at the key point's scale is taken so all calculations are performed in a scale-invariant way. For a picture test at scale , the angle extent m(x,y), and introduction , are pre-registered utilizing pixel contrasts

Keypoint Descriptors

The gradient information is rotated to line up with the orientation of the keypoint and the weighted by a Gaussian with Variance of 1.5 * Keypoint scale

The keypoint descriptor data is used to create a set of histograms over a window centered on the keypoint. It uses a set of 16 histograms aligned in a 4 x 4 grid, each with orientation bins. Input arguments are calculated based on the formation of an image cells. Since there are 4 x 4 = 16 histograms each with 8 bins the vector has 128 elements.

COMPARISON OF TWO METHODS

Several videos have been performed using the mat lab to evaluate the impact of different types of similar videos. Support Vector Machine (SVM) is performed on the similar videos in which the correct and appropriate position and frames are retrieved from the similar videos. Automatic Extraction of Similar Videos using a video query clip based on Support Vector Machine contributes in several videos of several modelling and similar video extraction research area process. The Similar video extraction process is automatically performed and is used for additional purpose

(VISCOM) videos model to perform the extraction of the similar process. By comparing the two methods of Support Vector Machine and VISCOM model the accuracy rate and time are compared to retrieve the similar video from the database. Accuracy rate and time of an each process in both the methods are reduced as compared to the previous method as shown in table 1.

Table 1: Comparison of proposed method with existing method

Video	Support Vector Machine		Existing(K-Means)
Video	Accuracy	Time	Accuracy	Time
Sports	79.23	5.43	68.51	14.62
New	91.64	1.54	85.4	4.24
Entertainment	85.67	2.28	78.90	10.82
Cartoon	93.11	3.22	86.47	4.08
Average	87.74	3.62	79.90	8.74

Table 2: Accuracy comparison with existing method

	Matching Accuracy	Time	Memory
Proposed	98.59	95.88	98.02
Proposed	99.22	96.3	98.2
Proposed	98.5	96.9	99.4
Proposed	98.5	96.4	98.5
Existing	40.61	70.11	61.31
Existing	40.17	70.53	60.27
Existing	40.27	70.32	60.18
Existing	401.4	70.18	60.52

Besides, the search time of other algorithms grows faster than the proposed approach. This is because the proposed search is guided into the corresponding database CIT whose volume is not proportion to that of video database. According to experimental results, the proposed method can greatly improve the efficiency of video similarity search in large database. The accuracy chart of the proposed and existing method is shown in figure.

By giving query video, the proposed method can find more similar video clips with similar shots compared with the algorithm (1). Therefore, the better VSS performance can be obtained by proposed method with satisfying recall and precision rate. The results of average search time versus the number of video are shown in Fig.4. According to experimental results, the proposed method can greatly improve the efficiency of video similarity search in large database.

CONCLUSION:

The framework proposes a powerful inquiry preparing procedure for transient confinement of comparative examples from a long un-fragmented video stream utilizing bolster vector machine calculation by considering target subsequence. Pack of highlights is quantizing nearby descriptors into visual words which shape a vocabulary and after that applying versatile recovering plan. The SVM estimation of a question picture has 0.5 and the esteem ought to be more prominent than the estimations of the info inquiry cut. The question cut picture is utilized as an info picture and utilizing the inquiry picture that specific video can be recovered from the database. The Bag of highlights (Bof) display in picture recovery errand, which depends on the neighborhood descriptor, for example, SIFT. Quantizing neighborhood descriptors are used as visual words which frame a vocabulary, and afterward applying adaptable literary and video seeking method.

REFERENCES:

[1] Dinakaran. D, J. Annapurna, Ch. Aswani Kumar, ”Interactive Image Retrieval Using Text and Image Content” Cybernetics and information technologies, vol. 10, no. 3 pp.20-30,2010

[2] Chin-Chen Chang and Tzu-Chuen Lu, “A Color-Based Image Retrieval Method Using Color Distribution and Common Bitmap” Springer, pp. 56–71, 2005

[3] Kekre H.B, Sudeep D. Thepade, Tanuja K. Sarode and Shrikant P. Sanas , ”Image retrieval using texture features extracted using LBG, KPE, KFCG, KMCG, KEVR with assorted Color spaces” International Journal of Advances in Engineering & Technology, Vol. 2, Issue 1, pp. 520-531,2012

[4] Vivek Jain, Neha Sahu “A Survey: On Content Based Image Retrieval” International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622, Vol. 3, Issue 4, Jul-Aug 2013, pp.1166-1169

[5] V. Mohana Maniganda Babu, Dr. T. Santha, “Efficient Brightness Adaptive Deep-Sea Image Stitching using Biorthogonal Multi-Wavelet Transform and Harris Algorithm”, IEEE International Conference on Intelligent Computing and Control (I2C2 17),ISBN: 978-1-4673-9916-6,23-24th June 2017.

[6].V. Mohana Maniganda Babu, Dr. T. Santha, ”Efficient Brightness Adaptive Deep-Sea Image Stitching using Biorthogonal Multi-Wavelet Transform and Harris Algorithm”, IEEE International Conference on Intelligent Computing and Control (I2C2 17), ISBN: 978-1-5386-0374-1,23-24th June 2017.

[7] Sakthi Sivkumar. V, “Organic On-Field Research Combined with Information Technology for a Fast Uptake, Sustainable and Profitable Future Using Both Macro and Micro Analysis of The System as a Whole”, (ORGATROP 2017) - International Conference on Organic Agriculture in the Tropics: State-of-the-Art, Challenges and Opportunities, Yogyakarta, Indonesia, August 20 – 24, 2017.

[8] Sakthi Sivakumar. V, “Applying both modern and ancient Management principles and planning to optimize the Organic sector from field level to increase the Sustainability and Profitability” (OWC – 2017), Organic World Congress, Greater Noida, India, Nov 09 -11, 2017.

[9] M Abhayadev, Dr. T Santha, “Object Boundary Identification using Enhanced High Pass Frequency Filtering Algorithm and Morphological Erosion Structuring Element”, Journal of Scientific & Industrial Research (SCIE),Vol. 76, pp.620-625, October 2017.

Received on 08.05.2019 Modified on 10.05.2019

Research J. Science and Tech. 2019; 11(2):148-154.

DOI: 10.5958/2349-2988.2019.00022.6